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Abstract 


RISC has become a mainstream movement to improve computing power and keep cost of 
design and design time low. In this thesis a 32 bit VLSI RISC architecture is designed 
and referred as iitk-RISC. The iitk-RISC supports four stage instruction pipeline and has 
128 on-chip registers. This allows a fast operand fetch and simple data management by the 
compiler. The iitk-RISC also supports 4 Giga byte memory space organized in byte banks. 

The processor has been implemented using l.bjLt scalable CMOS c3tu process. It has 
83638 transistors and fits in the areaof 7.85x7.24mm. The simulation results show that iitk- 
RISC can operate at 15MHz. With the large register set, it is possible to avoid memory reads 
and writes to approximately 10% of the program size. This gives an average performance 
of 14MIPS. 
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Chapter 1 

Introduction 


Many factors have influenced the design of a processor. These include availability of VLSI 
technology which made it possible to realize cache, data prefetch, data pipelines, instruction 
pipeline etc. on the processor chip. Using these concepts several processors (RISC, CISC 
and semi RISC) have been introduced in the recent past. Rest of this chapter discusses the 
concept of RISC, how VLSI made RISC a reality, ‘‘iitk-RISC". and the organization of the 
thesis. 


1.1 Reduced Instruction Set Computer 

In late 70s and early 80s general trend in computers was to increase the complexity of 
architectures by providing complex instructions and complex addressing modes. As a result 
the control unit of these processors occupied a good amount of silicon area: see for example, 
control unit of MC68000 is 68% of the chip area [.5]. The complexity of these computers, 
(here onward referred as to CISC) had some negative consequences. These include increased 
design time and errors and increased duplication of resources, where resources stand for 
instruction set and functional units. Due to complexity of CISC even a million transistors 
chip was insufflcient to design a single chip computer. 

This led to a hypothesis that by reducing the number of instructions one can design 
a suitable VLSI architecture that uses a scarce resource, the chip area, more effectively 
than CISC [8]. Therefore the main emphasis in RISC de.sign is to utilize the silicon area as 
optimaly as possible. 

Some of the features of RISC design are as follows: 
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• They supports simple instructions. In RISC instructions are kept as simple as.possible. 
In most of the cases a RISC instruction is equivalent to one micro call in CISC. 

• RISC processors have fixed instruction format. A choice of fixed instruction format 
simplifies the instruction fetch and decode logic. Further registers access can be done 
in parallel with instruction decoding. 

• There are good number of on-chip registers in RISC processor. This enables compiler 
to keep frequently used variables in registers. 

• The core architecture of RISC is LO.\D/STORE type architecture. Only LOAD and 
STORE instructions can access data from outside. Other instructions operate on 
registers as their source of operands and destination for result. 

• All RISC processors execute one instruction per cycle. 

• The instruction set in RISC processor has a good Support for high level languages. 

Due to simple instructions, fi.xed instruction format, and small number of addressing 
modes the control logic of RISC is very simple. The silicon area thus saved is used to 
provide on-chip memory in the form of registers and caches. This reduces the demand on 
critically limited chip bandwidth. The result is increased system throughput. A typical 
RISC processor ha.s up to 128 registers. 

To utilize on-chip resources to their maximum potential the execution sequence is divided 
into several easy to implement subsequences. Those execution subseciuences are so arranged 
that a unit executing a subsequence is ready to accept a subsequence in every cycle. This 
arrangement of execution subsequences is called execution or instruction pipeline. For 
simplified and regularly running pipeline several things are to be considered. For example, 
in case of control transfer instruction, to keep the pipeline full instead of flushing, delayed 
branch technique is utilized. In almost all RISC processors the jump is not be taken until 
an intervening delay instruction has been executed. Advanced Micro Devices(AMD) claims 
that 90% of the time compiler can insert a useful instruction into the "delay slot’ [1]. In 
fact there is scope of other optimizations as well and this will improve the performance of 
the system many fold. 
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mi 


MC68030 

iitk-RISC 

cmp 

4 cycles 

7 cycles 

1 cycle.s 

call 


17 cycles 

1 cycles 



15 cycles 

1 cycles 




mamm 

jmp 

17 cycles 

14 cycles 



Table 1.1: Instruction Execution Times for Some Processors 

To summarize our discussion on RISC and CISC processors, the time required for some 
of the typical instructions is listed in table 1.1 for two CISC processors and compared with 
that required for these instructions in iitk-RISC. 

1,2 Contribution of this Thesis 

In this thesis, iitk-RISC has been designed and implemented at Computer Science and 
Engineering department of IIT KANPUR. It has almost all features of RISC discussed 
above. The iitk-RISC is a 32 bit architecture and consists of a control unit, an execution 
unit, a memory access unit, a write back unit, a register-file, and an interrupt handling unit. 
It supports 4 Giga byte memory space organized in byte banks. It has a four stage pipeline: 
Fetch/Decode and register fetch(IF and DRF), execution(EXEC), Memory access(MEM), 
and Write back(WB). The input clock frequency is 30MHz and it is divided by two internally 
to get the pipeline clock. Jn iitk-RISC an instruction cycle is taken to be same as one clock 
cycle. However, execution cycle is the number of cycle.s required to execute an instruction in 
a non-pipelined architecture. .An execution cycle of iitk-RISC consists of four clock cycles. 

The register-file of iitk-RISC consists of 128 registers of 32 bits each. The execution 
unit consists of a 32 bit adder/subtractor, a logic unit, and a barrel shifter. It is capable of 
doing 32 bit operations in a single cycle. The memory access unit can access a byte, a half 
word, and a full word from memory and data is aligned accordingly. In case of read access of 
memory the read data is converted to either signed 32 bit or unsigned 32 bit depending upon 
the type of read. The write back unit writes data into register-file or flag register or does 
not write as required by the instruction. The Interrupt handling unit monitors the on-chip 
illegal activities and interrupt requests from outside. For example illegal memory address. 
If an illegal activity or an interrupt request is detected then the interrupt' handling unit 
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offest lines. The interrupt, table of iitk-RISC is 2k bytes deep and lies at the bottom of the 
memory map. 

The iitk-RISC instructions are of fi.xed size and fixed format. The instruction .set of iitk- 
RISC has instruction for data movement among the register; instructions for arithmetic, 
logical, and shift operations; instructions to read an unsigned or signed byte, half word, 
and a word from memory; instructions to write a byte, a half word, and a word into the 
memory; and instructions to alter the control flow. Instruction set also includes few privilege 
instructions, for example, write into flag register and normal user is not permitted to e.xecute 
these instructions. 

1.3 VLSI CAD Tools 

Several tools are available for designing and testing ICs by computer without actually 
fabricating them. These design tools consist of tools for simulating circuit at transistor 
and/or gate level, tools for designing layout and preparing mask for fabrication. Several 
fabrication technologies are available these days, for example, NMOS, SCMOS, HMOS, 
CMOS. A technology defines design rules and these rules are to be closely followed while 
designing the ICs. The design tools provide utilities to check the design for possible design 
errors. These rules include the minimum dimensions of layers, minimum separation of two 
layers etc. The design is normally done in terms of scalable unit A as defined by C. A. Mead 
and L. Conway [6]. 

The iitk-RISC is designed with the aid of nelsis-IC design tool [7]. from TU Delft software 
distribution available in Computer Science and Engineering Department, IIT KANPUR. 
The layout of iitk-RISC is designed using interactive layout editor dali with A fixed at 
0.200/r, for 1.6;i techuologj'. .411 circuit simulations are done with SLS (.A. Switch Level 
Simulator) [4] and SPICE [10]. 
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1.4 Thesis Organization 

Chapter 2 reviews some of the existing true RISC and semi RISC architectures. The next 
three chapters deal with the instruction set, architecture, and layout of iitk-RISC. They 
show how iitk-RISC fits into the category of true RISC. 

The thesis concludes with the results that have been obtained and some possible ex- 
tensions to the iitk-RISC. In Appendix A instructions set of iitk-RISC is given. The basic 
modules, that are used in implementation of iitk-RISC, are illustrated in Appendix B. 


Chapter 2 

A Survey of RISC Mainstream 


This chapter discusses some of the RIS(.' processors' design. 7'he processors discussed 
include: RISC I, RISC II. MIPS R2000. inlel’s 809(50. .\MD’s 29000. and SP.A.RC. The 
processors are discussed roughly in ascending order of their architectural complexity. The 
discussion is brief and the design issues to be considered in subsequent chapters are included 
specifically. 

Some of the newly introduced processors have been excluded because these processors 
do not qualify the basic criterion of RISC, namely, simplicity. Though these are claimed to 
be RISC processors by their respective vendors, yet computer architects prefer to call these 
processors Pseudo RISC. 

2.1 The Berkeley RISC I 

The concept of RISC is not new but it was taken scriou.slv only after the RISC I was 
designed at UC Berkeley. The RISC I architecture [8, 5] has 31 instructions, most of which 
do simple ALU and shift operations on registers. Instructions, data, addresses, and registers 
are 32 bits wide. RISC I execution cycle consists of fetching instructions and execution of 
instructions. With two stage pipeline no internal forwarding is required. The register-file 
consists of 138 registers of 32 bits each, though its first model "RISC GOLD" was fabricated 
with 72 registers only. Separate buses are used for reading two registers and writing result 
back. Register- file is partioned into overlapped register-windows of 32 registers. Out of these 
32 registers, Ro . . . Rg are termed as global registers, Rm • • - Ri-s are termed as low registers, 
Ri 6 . . . Rgs are termed as local registers, and Rge • • ■ Rai are termed as high registers. Every 
time a CALL instruction is executed a new window is allocated and the low registers of 
caller’s window become the high registers of callee’s window. On overflow or underflow a 
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trap is generated and necessary slops arc* lak('ii In' tlio intornipt liaiidlor rouliiio to allocate 
registers in memory. For regular and smooth running of |)ipeline the concept of delayed 
branch is used in case of control transfer instructions. 

The layout of RISC I was designed using NMOS technology with A at 2 microns and no 
buried contacts. 

2.2 The Berkeley RISC II 

This is the successor of RISC I and most of its architectural features were copied from 
RISC I. The aspects in which it differs from RISC I are discussed in this section. 

The RISC II CPU [5] has two modes of operation namely u.ser- visible! u-v), and interrupt- 
handling(i-h). Some of the instructions can be e.xecuted in i-h mode only. In case of illegal 
execution of instructions a trap is generated. 

RISC II is a 32 bit architecture. The instruction execution pipeline consists of three 
stages; fetch, compute(ALU), and write back result. The simultaneous read and write to 
the register-file is avoided and only two bus register-filo is u.sed. The register-file consists of 
8 overlapped windows of 32 registers each of which Ro • . . Ro are global registers, Rm . . . Ris 
are input registers, R 16 ...R 25 are local registers, and remaining Rje-.-Rsi are output 
registers. On call instruction a new window is allocated and the current status of window 
is reflected by CWP(current window pointer), and SWP(saved window pointer). These two 
window pointers are the part of PSW( program status word). PSW includes four condition 
flags (zero, negative, overflow, and carry), interrupt enable bit(I). system mode bit(S), and 
previous system mode bit in addition to the CVVP and SWP. Three program counters ar-e 
used, NXTPC contains the address of next instruction to be e.xecuted. PC contains the 
address of the instruction that is in e.xecution process, and LSTPC contains the address of 
interrupted instruction in case of interrupt. Whenever an illegal condition is detected an 
interrupt is raised and content of PC is pushed into LSTPC. 

Three types of instruction formats are supported. These are long immediate^ short 
immediate, and register-register. The instruction set consists of 53 instructions. It includes 
instructions for arithmetic and logical operations; instructions for loading and storing a 
byte, a half word, and a word, control transfer instructions, and some other miscellaneous 
instructions. 
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The layout of RISC II was designed using NMOS technology with A at 2 microns and 
it requires a 4 phase clock. 

2.3 The MIPS R2000 

The MIPS R2000 [9] is a 32 bit architecture. The instruction execution pipeline consists 
of five stages: .IF (instruction fetch), RF( register fetch). MEM(data memory reference), 
WB(write result back). The CPU is logically composed of six synchronized units: Master 
pipeline control unit, Execution unit, .\ddress unit, Translation look-aside buffer. System 
co-processor interface unit, and External interface controller. The Master pipeline control 
latches instruction field off the data bu.s and performs instruction decode. It aJso controls 
the pipeline if any abnormal conditions arise. 

The execution unit is composed of a 32 bit shifter, an incoming data aligner, an arith- 
metic logical unit and, a multiply and divide unit. Multiply and divide unit operates 
autonomously from rest of the processor. Internal bypassing is done, to get the consistent 
result, inside the execution unit. 

The R2000 does not use cdndition codes to govern the control flow. Instead, the CPU 
provides brajiches based directly on simple data comparison and .A.LU operations that di- 
rectly create boolean values in register. The compiler synthe.size.s branch conditions from 
the instructions available to access these boolean values. Delayed branch concept is used 
to avoid bubbles in pipeline. 

All the memory reference instructions are processed in MEM part of the execution cycle. 
The address of data is computed in .A.LU part of execution cycle and data is actually accessed 
in MEM part of execution cycle. For instructions other than memory access instructions 
MEM part of execution cycle is converted into an intermediate latency. The layout of R2000 
WcLS designed using double metal CMOS technology with A at 2 microns. 

2.4 The 80960 RISC Architecture 

Architecturally, 80960 architecture [2] is very different from RISC mainstream. For one 
thing, its design has microcode in a number of places with the main 32x42 bit microcode 
ROM which is used to generate 42 control signals. For another, it has a full IEEE-7.54 
floating point unit complete with its own 80 bit registers. Its instruction set includes a 
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LOCAL( 16x32) 


V 

ALU 


F LOATI NG-POINT UNIT 
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BUS CONTROL LOGIC 


I ADDRESS/DATA 

Figure 2.1: Intel’s 80960 RISC Architecture. 
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FETCH UNIT 
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INSTRUCTION 

DECODER 


MICRO-SEQUENCER 


MICROCODE ROM 
(3Kx42) 


large number of specialized controller-type instructions like boolean logic operations and bit 
manipulation. Some of the architectural details are shown in figure 2.1. It was fabricated 
with 1.5/im using CMOS technology. 

2.5 The 29000: AMD’s RISC Machine 


The computer architects consider ClStl as a computer running inside computer. Dan 
O’Dowd, a computer architect who was responsible for 32000 CISC family from National 
Semiconductor Corp., points out that a RISC performance can be extracted from a CISC 
computer by removing the outer "macrocomputer” level from a CISC, leaving the inner 
microcoding “computer within a computer.” On this idea .AMD introduced this bit slice 
RISC machine. It was fabricated using 1.2/t CMOS technology. 

The architecture of the 29000 is shown in figure 2.2. It is a .32 bit architecture and has 
192 on-chip registers [2]. 
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BRANCH TARGET 


REGLSTER-FILE 

CACHE 
(2 X 64 X 32) 


(3 FORTS, 192x32) 


4-STAGE PIPELINE 

• FETCH 

• DECODE 

• EXECUTE 

• WRITE BACK 


TLB 

(2 X 32 X 64) 



INSTUCTION I ADDRESS 


DATA 


Figure 2.2: The AMD’s 29000, a bit slice RISC Architecture. 


2,6 The SUN’S SPARC 


SPARC {Scalable Processor Architecture) [2] was intended to be a high speed true RISC 
processor unlike its contemporary architectures. The basic architecture of SPARC is shown 
in figure 2.3. The first SPARC was realized in a 20000-gate, 1.5/im gate array from Fijitsu 
Microelectronics Inc. It is a 32 bit machine and has 136 on-chip registers. These registers 
are partioned into overlapped windows and a window is assigned to a task. It was later 
fabricated using O.Sjum CMOS technology. 

2.7 Conclusion 


The concept of RISC processors started with the RISC I architecture and its on-chip reg- 
ister resources, small number of addressing modes, and small number of instructions were 
publicized as a RISC standard. However, the new RISC processors are departing from 
this standard very much. The new RISC processors seem to follow a different line of de- 
sign, closer to their counterpart CISC. In-fact people prefers to call these RISC processors 
Pseudo RISC. Some of the interesting observations about these processors are as follow's: 
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• All are 32 bit architectures. 

• All are pipelined architectures and use a 2 to 4 stage pipeline. 

• The number of on-chip registers is declining as some e.\tra features like MMU, FPU 
etc. are added on-chip. 

• .A.11 architectures use on-chip or off-chip data and instruction caches. 

• The complexity of processors is increasing. 

• The chip area required for fabricating the proce.si;ors is increasing with time. 




Chapter 3 

Instruction Set of iitk-RISC 


There is always a trade off between compactness of code and CPU performance. As mem- 
ory is a critical resource, a compact code will enable smaller devices for handling the same 
amount of compiled code. Here memory devices include secondary memory, primary mem- 
ory, and instruction cache. 

There are two main methods for reducing the average code size. Firstly, an instruction 
format closer to Huffman encoding may be utilized. This means having a variable num- 
ber of fields in the instructions. The choices are made according to the relative usage of 
instruction and fields. Secondly, the approach followed in CISC is that of merging two or 
more instruction into one [5]. 

Both the approaches have their own merits and demerits. We shall not discuss that 
here. The compact code also has some negative consequences, discussed in section 1.1. 

In accordance with the RISC philo.sophy. all iitk-RISC instructions are one word long 
and have fixed format for easy decoding. The meaning of some fields of in an instruction 
format is fixed and can not be altered. The iitk-RISC is a .32 bit architecture and can 
support 4 Giga-byte of memory space. It supports a LO.A.D/STORE type architecture 
where a memory access is done by some specified instructions. Data in memory can be 
stored as a byte, a half word, and a full word. The accessed data is converted into signed or 
unsigned full word before writing into register-file. This offers simplicity and full flexibility 
of supporting different data types. 

This chapter discusses the instruction set of iitk-RISC briefly. A detailed description 
can be found in Appendix A. The instruction set is divided into four broad categories; Data 
Movement Instructions, Data Manipulation Instructions, Control Transfet Instructions, 
and Miscellaneous Instructions. 



SHORT IMMEDIATE : 



REGISTER-REGISTER : 



SCC: SET CONDITION CONTROL 
RSi, RS2; REGISTER OPERANDS 
RD: DESTINATION REGISTER 
SIGN: SIGN BIT FOR IMMEDIATE DATA 

Figure 3.T. Instruction Formats of iitk-RISC. 

3,1 Instruction Format 


The different instruction formats of iitk-RISC instructions are shown in figure 3.1. All 
instructions of iitk-RISC are one word long and aligned at word boundaries in memory. 
The format of an instruction in the instruction set depends on the addressing modes it 
supports. 


3,2 Addressing Modes 

The iitk-RISC in accordance with the philosophy of RISC architecture supports small num- 
ber of addressing modes. Some fields in ’the instruction format carry same information for 

A 

all instructions whereas some other fields encode the information specifying the addressing 
mode used and hence are dependent on addressing modes. For control transfer instruction 
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relative addressing mode is provided and it lias been simplified enough for the implemen- 
tation purpose. All instructions use at least one operand which is specified in the field 
RSi. 

Register- Addressing The second source of two operands instruction is a register. This 
addressing mode is supported by all two operand instructions. Format for these 
instruction is shown in figure 3.1.c. 

Long- Immediate Addressing Second source of the instruction is 17 bit signed immediate 
data. The sign of immediate field is determined by SICi'.\'(bit<16> of Long-Immediate 
Instruction format) and immediate data is sign e.Ktended to 32 bit. This addressing 
mode is supported by two operand instructions with implicit destination. Format of 
these instructions is shown in figure 3.1.a.. 

Short-Immediate Addressing Second source of the instruction is 9 bit immediate data. 
The sign of immediate field is determined by SIGN(bit<8> of Short-Immediate In- 
struction format) and immediate data is sign extended to 32 bit. This addressing is 
supported by all two operands instructions. Formal of these instructions is shown in 
figure S.l.b. 

Register- Relative Addressing The effective address(EA) is calculated as 
EA=[RSi]- 1 -S 2 , S 2 can be a. register, 16 bit immediate offset, or 8 bit immediate offset. 
If S 2 is immediate offset then it is first converted to 32 bit signed immediate offset. 
This addressing mode is supported by all the control transfer and LO.AD/STORE 
instructions. 

PC-Relative Addressing If PC is used to calculate the effective address in place of RSi, 
then Register-Relative addressing becomes PC-Relative. The effective address is cal- 
culated as EA=[PC]+S 2 . S-z can be a register. 17 bit immediate offset, or 9 bit imme- 
diate offset data. Similar to previous addressing modes, S 2 is first sign extended to 
32 bit. In this addressing mode RSi is ignored because in effective address calculation 
PC is used as first operand of the instruction and RSi field of instruction can not be 
used as other then the first operand of the instruction. This addressing is supported 
by PC relative control transfer instructions. 
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Figure 3.2: The Flag Register of iitk-RISC. 


Data Manipulation Instructions do not support Long-Iniinediate Addressing as in this 
case the destination can not be .specified. 


3.3 Flags 

The flags in iitk-RISC are arranged in a form of a 32 bit register and can be accessed by 
the user. It is the implicit source/destination for two instructions. The flag register of 
iitk-RISC is shown in figure 3.2. 

The C, V, S, and Z flags are condition flags and reflect the result of data Manipulation 
instructions and flags are changed according to e.xplain below if SCC field in the instruction 
is set to 1 (bit<25> = l) and instruction can change the flags. 

Z:=[DEST==0]; S:=DEST<31> 

V:=0; in case of shift and logical instructions. 

V:=[32-bit 2’s-complement overflow occurred]; in case of arithmetic instructions. 

C:=0, in case of logical instructions. 

C:=RSi <shift_count-l>((shift_count-l )>0); in case of shift right instructions. 

C:=RSi <32-shift_count>( (32-shift. count )>0); in case of shift left instructions. 

C:=carry<31>to<32>(for RSi and .S 2 unsigned): in case of addition instructions. 

C:=NOT[borrow<31>to<32>](for RSi and S 2 unsigned); in case of subtraction instruc- 
tions. 

The I, P, and T flags are used for system control. If 1=0, no interrupt will be honored by 
the CPU. If P=0 then CPU is in privilege mode. The T flag is used to store the content of 
P in case of an interrupt. A normal user can access the flag register however, changing the 
flag register is allowed only in privilege mode. When the system is reset all flags are cleared 
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except T flag which is set. The system flag T is set heraiise at the end of the execution 
of reset program a reti instruction will be executed to transfer the control to some user 
program. 

3.4 Instruction Set 

In this section, the instructions have been grouped into five categories and the discussion 
of a particular category is applicable to all instructions of that category. 

3.4.1 Data Movement Instruction 

The only instruction that comes in this category is mov. In this instruction second source 
is redundant and ignored. No flag is changed even if SCC=1 because in moving content of 
one register to some other register no arithmetic, logic, or shit operation is involved. 

3.4.2 Data Manipulation Instructions 

This category includes instructions for addition, subtraction, logical operations and shift 
operations. All of these support Register- Addressing, and Short-Immediate Addressing. All 
the instructions except inv are two operand instructions. Instruction inv takes only one 
operand. Flags ai-e changed to reflect the result of Data manipulation in.struction if SCC 
bit is set. 

For shift instructions shift count is provided by the second operand. .As the shifts more 
then 31 bit are meaningle-ss only. 5 bit(<l;0>) are u.scd a.s shift count, \arious types of 
shift supported in iitk-RISC are illustrated in figure 3.3. 

3.4.3 LOAD/STORE Instructions 

This set of instructions is used to access inemoiw for read and write operations. Data can 
be accessed as a byte, a half word, and a full word. The width of the data being accessed 
is provided by two data-width-liues: Wo and W'l. 

In case of load instructions (.'PU always internally read.s full word. Separate load in- 
structions are provided for loading unsigned byte, half word, and full word. Similarly for 
loading signed byte, half word, and word sepai’ate instructions are provided. If read data 
is byte or half word then it is aligned and converted to either signed 32 bit or unsigned 32 
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SHR 


SAR 


Figure 3.3: Shift Operations. 


bit, depending on the signed or unsigned access of data, before writing into register. How- 
ever, in case of full word read operation data is written into register specified without any 
modification. An example of loading second signed byte is shown in figure 3.4 

The load type instructions support Register- Relative Addressing mode. However, the 
second field S 2 in effective address calculation can not be 16 bit immediate offset because 
load type instructions require a destination register. 

For write access of memory, separate instructions are provided for storing a byte, a half 
word, and a full word. The iitk-RISC CPU always outputs a full 32 bit word onto the bus. 
However, only some of the bytes in that word are to be written into the corresponding bytes 
of addressed word. The number of bytes to be written are determined by the instruction 
in execution. For example, in case of storb instruction only one byte will be written. For 
this iitk-RISC supports a memory system organized in byte banks. The bank(s) in which 
byte(s) is(are) to be written isfare) selected by W'o nnd Wi, .^0 nnd .A-i lines of address bus. 
The width code lines indicate the width of item to be written. The decoded meaning of 
these four lines is given in table 3.4.3. Figure. 3..5 shows an example of storing a half-word 
into effective-address <1:0>=10 

The store type instructions support Register- Relative Addressing mode where the effec- 
tive address is calculated as:eff-address=[RSi]-l-0. 

No instruction from this category modifies the flags irrespective of SCC bit. 
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Figure 3.4: Loading a Byte. 
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Table 3.1: Storing Data of Various M^idths. 
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Figure 3.5: Storing a Half-Word. 


3.4.4 Control Transfer Instructions 

Conditional jumps, Unconditional jumps, and Procedure call constitute this group. This 
set of instructions is used to translate decision boxes of flow chart of a program. There are 
two types of control transfer instructions provided. The first type of instructions support 
Register-Relative Addressing mode whereas second type of instructions support PC-Relative 
Addressing. The jump conditions are evaluated according to the table 3.2. A jump delayed 
by one cycle takes place if the condition evaluates to TRUE. The instruction slot next to 
the control transfer instruction is called dtlay slot of that control transfer instruction and 
the instruction in delay slot is alw^ays executed due to delayed control transfer. 

The call instruction differs from other instructions of this group in one respect. It pushes 
the address of the instruction that has been executed in delay slot into specified register. It 
will be used as a return address. Thus to get the correct return address the address pushed 
by caWinstruction should be incremented by 04h. This can be done in the control transfer 
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Table 3.2: The iitk-RISC Jump Conditions. 


instruction used to return from procedure or function call. Figure 3.6 shows an example. 

Instructions of this group do not modify the flags even if SCC bit is set. 

3.4.5 Miscellaneous Instructions 

In this group two types of instructions are included. Instructions reti, getlpc, and putpsw 
are privileged instructions and can be executed only if P flag is 0. The non- privileged group 
comprises getpsw and nop instructions. 

The reti instruction is used for return from interrupt handling routine. It restores the 
previous system operation mode(P flag) and loads the PC with the content of specified 
re^ster. The control transfer is delayed by one cycle. 

The getlpc must be the first instruction of any interrupt handling routine. It moves the 
content of LSTPC (the address of interrupted instruction) to register specified, which will 
be used to restart the interrupted instruction. 

The putpsw instruction is used to change the content of flags. Changing the flags C, S, 
Z, and V is meaningless because these flags change dynamically and no one can predict the 
content without actually analyzing the program. The result of instruction will be effected 
only after the end of execution cycle. 

The getpsw pushes the content of flags into the register specified. The nop instruction 
does nothing and generally used to fill the delay slot if compiler is unable to fill it with 
some meaningful instruction. Some times it used to introduce calculated amount of delay 
in program. 

No instruction of this group except putpsw changes the flags, putpsw changes flags even 


if SCC bit is dear. 
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MAIN PROGRAM: 

XXXXXXXX call Ra.Rs^Ri) 
+4h xor Rs.Rr.OSh 
^ + 8 h • • • 


\ SUBROUTINE: 
YYYYYYYY mov Ro,Rl5, . 
+4h • • • 


ZZZZZZZZZZ jmpr R 5 , . ,04h 


Figure 3.6: An Example of Procedure Call. 


Illegal Instruction : FFF'FF808h 
Address Violation : FFFFFSOOh 

Figure 3.7: Interrupt Vector for Illegal Conditions. 


3.5 Illegal Conditions 

There are some illegal conditions that may arise during the program execution. We have 
tried to detect all such illegal conditions that may affect the program execution. On detec- 
tion of an illegal condition the system changes its operating mode to privilege mode and an 
interrupt or exception handling routine is called. The instruction which causes the illegal 
condition is executed as nop. Figure 3.5 shows the interrupt vector for the two case of illegal 
conditions to be discussed. 

These illegal conditions are: Illegal Instruction and Memory Address violation(or Illegal 
Memory Address). 

3.5.1 Illegal Instructions 

There are three cases of illegal instruction. These are illegal opcode, privilege violation, and 
invalid field. An illegal instruction is executed as nop and an interrupt is raised. 
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Illegal Opcode 

The opcode field of iitk-RISC’s instruction is 6 bit wide. With 6 bit 64 combinations are 
possible. For iitk-RISC some of these combinations are meaningless, that is, no instruction 
is associated with them. If iitk-RISC CPU finds any of these combinations, it is treated as 
Illegal Instruction. The list of illegal combinations is as follows: 17h, iCh, IDh, lEh, iFh, 
20h, and 30h. 

Privilege violation 

Some of the instructions are protected from the execution by a normal user because exe- 
cution of these instruction may cau.se some serious problems during execution. To avoid 
execution of these instruction by a normal u.ser privilege flag(P) is checked before execut- 
ing these instructions. If the check succeeds then only instruction is inserted into pipeline 
for execution otherwise instruction is executed as nop and instruction treated as an illegal 
instruction. 

Invalid Fields 

Sometimes the CPU finds that a particular field of a instruction is missing, for example, 
destination register is not given. This is another case of illegal instruction and treated 
accordingly. 

3.5.2 Memory Address Violation 

To access a full word and a half word the last two bits of effective-address(<l:0>) must 
be 00 and 00/10 respectively. Because, in case of iitk-RISC words are aligned at word 
boundaries and half word at half word boundaries. If any memory request violates this 
restriction, the address will be corrected by converting these two bits to 00. Memory will 
be accessed using this address and an interrupt is raised. 

3.6 Evaluation of the iitk-RISC Instruction Set 

In This section we evaluate the iitk-RISC instruction set. We discuss its appropriateness 
for High Level Languages! HLL) and its impact on code size. 
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HLL Statement: 

iitk-RISC instruction.s: 

if (count-f-|-<n) 

sub R„,Ro,Rcounu^CC set 
jge targetjaddr 

Rcount ’ Rcounl "t" 1 

C->xyz==*abc 

load Ra - M[Rc-{-OFFS^y,] 
load Rt 2 < — ^[Rabc "h fi] 
sub Rti,Ro,Rt 2 ’,SCC set 

C=C->xyz 

load Rc ^ M[Rc+OFFSry:] 


Figure 3.8: HLL Translation into iitk-RISC Codes. 


3.6.1 Instruction Set and High Level Language 

In most of the cases iitk-RIS(' instructions are similar to a micro-instruction of a typical 
CISC computer. One can argue that the instruction set of iitk-RISC is “of too low level” 
for a High Level Language. 

However, several frequently used statements of HLL can be compiled into only a single 
or a few iitk-RISC instructions. Some examples are shown in figure 3.8 

Thus, iitk-RISC instructions are not far away from some very frequently used HLL 
statements. Also the variants of LOAD/STORE as discussed earlier enable the iitk-RISC 
to support almost all the data types without much overhead. 

3.7 Conclusion 

The instruction set of iitk-RISC is designed in such a way that every instruction can be 
executed in a single cycle but not at the cost of increasing the number of instruction required 
to translate a HLL statement. The code translated in simple instructions has more scope of 
optimization as compared to translated in complex instructions. Further, we feel 4 Giga byte 
memory space is enough to fit any program of these days. Also iitk-RISC has enough number 
of on-chip registers for compiler to keep almost all variables. Based on these considerations, 
in case of iitk-RISC relatively unrompacted code will not cost too much. In a nutshell we 
can say that the performance of iitk-RISC will be better then most of the CISC computers. 



Chapter 4 


The iitk-RISC Architecture and 
Pipeline 


This chapter discusses the architecture of iitk-RISC including instruction pipeline of iitk- 
RJSC, the delayed control transfer, and the internal forwarding mechanism. The basic 
timing for read, write, and fetch cycles of iitk-RISC is also discussed. The discussion of 
architecture and micro-architecture is intermixed for easy understanding. 

4.1 The Instruction Pipeline of iitk-RISC 

The instruction pipeline of iitk-RISC is a four stage pipeline and supports internal for- 
warding of data for inter-stage data dependency resolution. In this section we focus on the 
instruction pipeline and its impact on the iitk-RISC architecture. 

4.1.1 A Four Stage Pipeline 

Execution sequence of a simple instruction in a processor follows a very simple pattern. It 
consists of the following sequence of operations: Fetcli Instruction, Decode instruction. Fetch 
operands, Perform operation, and finally write result into destination. These operations do 
not require equal amount of time to finish. As a pipeline is more effective if all of its stages 
take roughly equal amount of time, some operations can be combined to provide operations 
of roughly equal time. For example instruction decode and fetch operands are combined in 
iitk-RISC. 

We assume that iitk-RISC will have an off-chip instruction cache and fast primary mem- 
ory for filling the instruction cache and data memory access. This enables fast instruction 
fetch. Further, some of the operations can not be made faster because of implementation 
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IF: INSTRUCTION FETCH DRF : DECODE and REGISTER FETCH 

EXEC : EXECUTION MEM: MEMORY REFERENCE 

WB: WRITE BACK RESULT 

Figure 4.1: Instruction Pipeline of iitk-RISC. 

limitations and high cost of implementation. For example, the implementation cost of carry 
lookahead adder increases exponentially if we go for a faster adder. Due to these reasons 
instruction decoding could not be clubbed with execution phase. In iitk-RISC, to make all 
the steps of the execution sequence of almost equal time, fetch instruction and decoding is 
done in the same cycle. Operands are fetched in parallel with instruction decoding. The 
basic timing for fetch cycle, to be discussed later, validates our assumption. 

Considering all above assumptions we have designed the iitk-RISC pipeline. It is a 
4 stage pipeline and shown in figure 4.1 

The instruction is fetched during IF part of jth cycle. After fetching the instruction 
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following actions are taken in parallel: 

1. Opcode part of the instruction is decoded. 

2. Immediate data/offsel is aligned and signed extended to 32 bit immediate data/offset. 

The 32 bit data/offset is stored in immediate register. 

3. Register operands are fetched. 

For fetching register operands and taking immediate data/offset from the instruction, 
it is assumed that an instruction has second register operand and 17 bit or 9 bit immediate 
data/offset field. The first register operand is present in all the instructions. The bit<8:2> 
of an instruction are used for register fetch even if bit<9> is 0. The bit<9> and bit<17> 
determine the presence or absence of .second register operand. The bit<16:0> are taken 
as long immediate data if bil<17> is set. If bil<17> is 0 then the bit<8:0> are taken as 
short immediate data. The operands to be used for an instruction are determined by the 
control signals generated by decode of an instruction as discussed in Section 3.4.1. After the 
decoding phase is over the necessary operand sources are enabled. The RSj is required by 
all the instructions as first operand whereas either RS 2 or immediate data/offsel is required 
as second operand. 

In (j+l)th cycle, the decoded instruction together with all the necessary operands is 
sent for execution. In execution phase of the execution cycle the execution unit performs 
the required operation on the operands of the instruction. The operation performs by the 
execution unit may be add/subtract, logical operation, or shift operation. At the end of 
(j+l)th cyde all flags are updated if SCC bit is set in the instruction and the instruction 
can modify the flags. 

After the execution stage gets over, memory access stage reads the result of the execution 
stage. If instruction under consideration involves memory access then a memory access cycle 
is initiated in (j+2)th cycle, otherwise memory access stage sits idle in (j+2)th cyde. At 
the end of (j-f2)th cycle it sends data to the write back unit, which may be data read from 
memory, result of the execution unit, or nothing depending on the instruction. For example 
in case of load type instruction memory access unit passes the accessed data, in case of data 
manipulation instruction it passes the data read from the execution unit, and in case of 
control transfer instructions nothing is passed. Inclusion of this stage as a regular feature 
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of the pipeline requires some explanation. In RISC 11 and some other processors, in which 
memory stage is not a regular feature of pipeline, memory access stage is included only 
when there is a memory access instruction in pipeline. We found that in such processors 
decision regarding the inclusion of memory stage is to be made either during decode of 
instruction or at the end of execution cycle and it should be known to write back stage well 
in advance so that write back unit can delay its action by one cycle. This increases the 
number of control signals and affects the regularity of the pipeline. 

In write back stage of pipeline, data read from memory access unit is written into 
appropriate destination if required by the instruction. The destination may be a register in 
register-file or flag register. 

4.1.2 Analysis of Pipeline 

A pipeline also has some negative effects on the performance of the processor if the archi- 
tecture is not designed carefully. In this subsection we have discussed the possible problems 
with a 4 stage pipeline of iitk-RISC and how these problems are taken care of. 

Pipeline Suspension During Data Memory Access 

If there is a memory access instruction already in pipeline, then a instruction can not be 
fetched during the cycle when this load instruction will be processed by memory access unit 
in the computer systems in which address and data buses are shared by instruction fetch 
and data access both. This is because the address and data buses are busy in doing data 
memory access operation in that cycle. This situation is shown in figure 4.2 

iitk-RISC in this situation pushes one bubble into the pipeline, by inserting a nop. There 
are several remedies to avoid this bubble inside the pipeline. Firstly, separate buses for data 
memory access and instruction fetch may be used. This choice though promising, requires 
a two port external memory and increased pin-count. However, the instruction fetch will 
never clash with data memory access. Secondly, data cache may be used. In this case time 
shared address and data buses can be used, that is, address and data buses will be used to 
fetch the instruction and data memory access in the same cycle. 
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Figure 4.2; Pipeline Suspension During Memory .Access. 


Delayed Control Transfer 

Till this point delayed control transfer! delayed jump/branch) has been used without giving 
any reason of its inclusion in pipelined architecture. In this section concept of delayed 
control transfer has been introduced formally. Consider the execution sequence shown in 
figure 4.3. 

During first part of jth cycle instruction Ij is fetched, which is a control transfer in- 
struction. It is decoded in second part of jth cycle. During cycle (j+1), the target address 
for control transfer instruction is computed and the necessary boolean condition is also 
evaluated. The boolean condition is evaluated during the first part of (j-l-l)th cycle. Along 
with the execution of Ii, a new instruction I 2 is also fetched and inserted into pipeline for 
decoding and operand fetch because till the end of (j-l-l)th cycle the execution of control 
transfer instruction is not completed (target address has not been computed). At the end of 
execution phase the target address of control transfer instruction is available and in case of 
conditional jump type instructions the decision about the successful or unsuccessful jump 
has been taken. If control is to be transferred then target address of the control transfer 
instruction is loaded into the PC in the end of (j-f-l)th cycle and the fetching now can be 
done from the target of control transfer instruction from cycle (j-l-2) and onwards. 

The instruction I 2 , which is already in pipeline, may be fiushed out of the pipeline or left 
inside the pipeline. The first solution is not attractive because CP U has already decoded I 2 
and flushing it will waste one cycle. Further flushing the pipeline will require extra control 
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j j+l j+2 



Figure 4.3: Delayed Control Transfer. 


and will complicate the flow of pipeline. The iitk-RISC uses the second option because it 
does not require any other extra hardware and no clock cycle is wasted. 

The effect of leaving I 2 as such is that the control transfer is virtually delayed by one 
cycle hence the name delayed control transfer. The instruction slot I 2 is referred to as 
the Delay Slot of instruction h. The instruction in delay slot will always be executed 
and compiler can put an instruction in delay slot which does not affect the decision of the 
control transfer is not going to be effected. Most of the time compiler will be able to find 
a meaningful instruction suitable for delay slot but occasionally compiler will insert a nop 
instruction there. Empirical results have shown that the compiler is able to fill the delay 
slot of unconditional control transfer instruction about 90% of the time and delay slot of 
conditional control transfer instruction for the 40% to 60% of the time [5, 9]. 

If two or more control transfer instructions are executed in consecutive cycles then 
the control will be transferred to the target address of the last executed control transfer 
instruction. However, instruction at the target address of control transfer instruction is 
also executed in the delay slot of subsequent control transfer instruction. This situation is 
illustrated in figure 4.4. 
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90--- 

100 jmp 150 
124jmp 200 
128--- 


150 add • • • ;l3; delay slot of I2 

154 : 

200 jmp 300 ;l4 , control will be here after Ii and I2 

204 jmp 360 ;I5, delay slot of I4 

208 


300 jmp 90 ;I6, delay slot of I5 

304 ••• 

Figure 4.4: The Effect of Consecutive Jumps. 

Ii: loadw Ri,R37R2 
I2: add R^.R^jRg 
I3: sub R5 ,Rt,R 3 
I4: sub R5,. ..,R7 

Figure 4.5: An Example Requiring Internal Forwarding. 

The Internal Forwarding 

In pipelined architecture an instruction is executed in every cycle and a new instruction is 
inserted into the pipeline without waiting for the earlier instruction to finish. Consider the 
program segment shown in figure 4.5. 

In non-pipelined execution there is no problem in executing this code segment but in case 
of pipelined execution the effect of this program segment will not be as expected because 
operand registers R5 and R3 of I3 will be fetched from the register file before they have 
been written. That means, the sub instruction will operate on old content of R5 and R3. 
Same is the case with I4. To get consistent result, in pipelined computer, the result of the 


;finaly control will be here 

;l2, delay slot of li 
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instructions already in pipeline is forwarded to subsequent instructions. This is referred to 
as Internal Forwarding. 

In iitk-RISC the places at which internal forwarding is done are shown in figure 4.1. 
Data manipulation instructions, data movement instructions, and control transfer instruc- 
tions require internal forwarding from memory access unit to execution unit and write back 
unit to execution unit. Load/Store type instructions require internal forwarding from write 
back unit to memory access unit in addition to above two. 

4.2 The iitk-RISC Architecture 

The basic block diagram of iitk-RISC is shown in figure 4.6. In subsequent sections every 
block is explored up to its circuit level design and design parameters are also discussed for 
the same. The implementation details of all the units discussed here can be found in next 
chapter. 

4,2.1 The Register-File 

iitk-RISC has 128 registers of 32 bit each and all registers are orgaiuzed in a group and as 
a whole they are referred to as Register- File. The register file has two ports for reading and 
writing operations. The porti is attached to system Bus A and port 2 is attached to system 
Bus B through register RA and register RB respectively. For reading operation a port is 
used. Simultaneously two registers can be read as there are two ports available. In writing 
operation data is read from porti of register file which is demultiplexed and attached to 
system BUS D through RD. Registers RA and RB are the output buffer registers whereas 
RD is input buffer register for register-file. 

The register Rq is hardwired to contain zei’o and writing into Ro is allowed without 
any effect. This is a special purpose register and may be used as the destination of the 
instruction in which user does not want to change the content of any of the registers, for 
example to compare two registers a subtract instruction is used with destination of the 
result of this subtraction instruction as Ro- The read and write operations on the same 
register simultaneously are avoided by carefully designing the pipeline. Refer to figure 4.1, 
the operands of an instruction are read during low part of the cycle (clock is low) but 
the result of an instruction is written only during high part (clock is high) of the cycle. 
Therefore reading and writing operations on register-file will not clash. 




Figure 4.6: The iitk-RISC Block Diagram. 
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The Register Cell 

A register is nothing but the replication of a basic cell shown in figure 4.7. The cell is a 
simple CMOS D-latch with two line.s for reading and a line for writing. The choice of D- 
latch over static RAM-cell is critical one. In case of RAM-cell the design of sense amplifier 
is critical and slight over or under design can affect the data read. The way D-latch is 
implemented costs almost same area as static RAM-cell. 

Register Address Decoder for Register-File 

Seven address lines are required to address all 128 registers of register-file. The iitk-RISC 
uses two register address decoders corresponding to two ports of register-file. Separate 
decoders are used for two register read operations and the first register address decoder 
is also used for writing operation. A simple AND-tree decoder is used for designing the 
decoder. Figure 4.8 shows an AND-tree decoder for 3 address lines. 

4.2.2 Execution Unit 

The execution unit of iitk-RISC comprises of a 32 bit add/sub unit, a logical unit for 32 bit 
operations, and a shifter for 0 to 31 bit shift operations. The execution step of iitk-RISC 
execution pipeline is performed by this unit. It takes two inputs from two temporary 
registers (It is assumed that input will always be available in these two registers before 
any operation is started to be performed by this unit) and after performing the operation 
requested for result is stored in DST-reg. At the end of computation ALU returns either 
the old value of condition flags or the modified values of flags. If SCC bit of instruction is 
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ABC 



OFF or instruction is not permitted to modify the flag (even SCC bit is 1) then old content 
of condition flags is returned otherwise modified content is returned. 

Adder /S ubt ractor 

We have chosen a carry-lookahead adder because the addition and subtraction operations 
are to be done in a single cycle. A carry lookahead adder consists of a carry generation block 
which generates all the carry bits (or block of these) in parallel and an add block which 
performs the actual addition. We have used a modified version of normal carry-lookahead 

j 

adder because of reduced complexity and implementation reasons. 

Consider the two numbers A and B , and a number 5 such that 5 = A + 5. Let A*, 
Bi, Si, and C,* be the bit i of number A, bit i of number B, bit i of sum S, and carry out 
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of sum Si respectively. The carry out of stage i may be expressed as 

Ci = Gi + Pi.Ci-i 

where Pi = A,- 0 £,•, carry propagation signal. 

Gi = A, .jB,-, carry generation signal, 
then Si = Af 0 B,- 

We can define a new operator o which has the following function: 
o ia',?') = {9 + {P-9'),P-P') 

where g,p,g'-,p' are boolean variables. The carry signal can be determined by 
Ci = Gi 


whcrc[Gi, Pi] 


(9}-P\)o{go.Po) ifi = l 

(9i,Pi}- • • o • • • (Ci_i, Pi_i ) if 2 <i<n 


The operator o is associative and allows the processing elements to be embedded in a 
binary tree [11]. Such a carry generation block is shown in figure 4.9 for 8 bit numbers and 
an input carry Cq. The adding block consists of XOR gates. 

Other details can be found in [11]. The iitk-RISC has two 16 bit blocks of the adder 
discussed above. The adder is appended with an XOR gate for each bit to get the one’s 
complement of second operand in case of subtraction operation and normal addition is 
performed with input carry Co is set to 1. In case of addition operation XOR gates pass 
the bits of second operands as such. 


Logical-U nit 

The logical-unit of iitk-RISC is capable of performing bitwise AND, OR, and XOR of two 
32 bit numbers and bitwise inversion of a 32 bit number. The logical unit is designed by 
replication of the basic cell shown in figure 4.2.2. 
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BINARY TREE REVERSE BINARY TREE 



Figure 4.9: A Binary Carry Lookahead Adder. 

Shifter 

A barrel shifter is used to shift 32 bit number read, from BUS A by the amount given by 
the shift-count (shift-count<32) read from BUS B. To get the shift-count five low order bits 
of TEMP 2 register are used. The design of barrel shifter consist of a basic shift block, a 
shift count decoder, and some extra hardware to make the basic block capable of doing all 
type of shift operations. The basic shift block is a 32x32 matrix of pass transistors and it 
is capable of doing right shift operation. The basic shift block has two input planes and a 
output plane. The control signals are fed in vertical direction, inputs(datato be shifted) are 
given diagonally, and outputs(shifted data) are taken out in horizontal plane. The output 
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Figure 4.11: The Memory Access Unit of iitk-RISC. 


of store type instruction it outputs the data together with the information about the width 
and the memory banks to be enabled as discussed in Chapter 3. The basic architecture 
of memory access unit is shown in figure 4.11 and the timing diagrams for read and write 
cycle are shown in figure 4.12. The request signal shown in read and write cycles is used 
internally and shown here for clarity. 

4.2.4 Write Back Unit 

The basic architecture of write back unit is shown in figure 4.13. The write back unit 
takes the input from memory access unit during low part of cycle and writes the result into 
register-file or flag register during high part of cycle. 

4.2.5 Interrupt Unit 

The block diagram of interrupt unit is shown in figure 4.14. The interrupt request may 
come from an external device or internal unit. Internal request can come from control unit 
for illegal instructions or memory access unit for address violations. The following actions 
are taken by the interrupt unit whenever it receives an interrupt request on one of the 
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Figure 4.13: The Write Back Unit of iitk-RISC. 
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Figure 4.14: Interrupt Unit and Interrupt Cycle. 


interrupt lines: 

• On int signal: [P]— f’T, [signature register]-* LSTPC, P-*- 1. 

• Further fetching of instruction is suspended and a nop instruction is inserted into the 
pipeline. 

• 8 bit of interrupt offset is generated either internally or after reading the 8 interrupt 
lines and using this interrupt offset interrupt vector is calculated. 

• PC is loaded with the interrupt vector just computed, and fetching of instruction is 
resumed from that address. 
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Interrupt line INT is checked in the begining of every cycle for the presence of interrupt 
request from external device but other internal interrupts are processed whenever detected. 
The interrupt due to memory address violation arises during the high part of cycle(clock 
is high). However, interrupt due to illegal instruction arises during low part of cycle(clock 
is low). When an interrupt request from external device or memory adders violation is 
detected an interrupt cycle is initiated. In case of interrupt request due to illegal instruction 
a short length interrupt cycle is initiated. This short length cycle takes half of the cycle 
to complete. Interrupt cycle once started can not be stopped and no further request from 
external device is entertained. The interrupt cycle completes at the end of current cycle. 
In figure 4.14 an interrupt cycle is shown. INTA is the acknowledgement signal for INT. 
External interrupts are maskable and can be disabled by clearing the 1 flag. The two internal 
interrupts are unmaskable but interrupt due to invalid instruction has low priority. Interrupt 
due to memory address violation ha.s high priority. The interrupt vector is computed a.s 
: interrupt-vector=FFFFF800h +offsetx8h. The interrupt table of iitk-RISC starts from 
memory address FFFFFSOOh and extends up to FFFFFFFFh. Two consecutive full words 
are reserved at each interrupt vector location as the first instruction here is an unconditional 
jump and due to delayed control transfer the next instruction is also executed. The interrupt 
vector locations FFFFFSOOh and FFFFF808h are used for memory address violation and 
illegal instruction respectively. 

4.2.6 Control Unit of iitk-RISC 

The structure of control unit is shown in figure 4.15. Control logic is divided into parts and 
almost all the units of iitk-RISC have a small control section. However, most of the control 
logic is inside control unit. This has reduced the number of control lines considerably. In 
this section fetching and execution process of instruction in iitk-RISC is discussed. 

Fetching a Instruction 

The part of control unit responsible for fetching instruction includes the Program Counter 
(PC), the instruction register, and the signature register. The PC is a 32 bit counter and 
bits<l:0> of the address generated by it are always 00. This is because all iitk-RISC 
instructions are aligned at word boundaries and memory is organized in byte banks. The 
fetch control first identifies whether address and data buses are free or not. Buses may be 
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Figure 4.15: Control Unit of iitk-RISC. 


occupied by memory access unit for memory access operation, which is given first priority 

because a pipeline stage can not be delayed. To see whether buses are free or not, fetching 

control accesses the busy bit of memory access unit. If busy bit is set the buses are not free 

otherwise buses are assumed to be free. If buses are not free then fetch cycle is not initiated 

and a nop is inserted into the pipeline. In other case, when buses are found to be free, a 

fetch cycle is initiated. The iitk-RISC fetch cycle is shown in figure 4.16. 

During the fetch cycle the content of PC is released onto the address bus and signal is 

sent to inform the memory that a fetch cycle is in progress. The content of PC is pushed into 

the signature register and the data bus is read after sending the RD signal. The instruction 

♦ 

is send for decoding and operand fetch. If int signal is received by the control unit from 
interrupt handling unit the fetch cycle just initiated is suspended and nop is pushed into 
the pipeline. 
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Figure 4.16: A Fetch Cycle ofiilk-RISC. 


Control Signals Generation 

The control signal generation part of control unit consists of instruction register, control 
PLA, and control register. The instruction is fetched from memory into instruction register. 
The instruction register is designed with simple D-latch with clear signal. Previously at 
several places we have mentioned that on occurrence of some particular signal a nop is 
pushed into the pipeline. The nop is pushed into pipeline by clearing the instruction register. 
We have kept the opcode of nop OOOOOOOOh intentionally for the same reason. Signature 
register works like an identification tag to a instruction register. The signature register has 
the memory address of a instruction that is in instruction register and is loaded during the 
instruction fetch. The signature of the instruction is used by several instructions. These 
include all the PC-Relative control transfer instructions. If control unit gets int signal from 
the interrupt unit, then the LSTPC is loaded with the content of signature register. 

Following is the list of the control signals that are required to manage the instruction 
execution in iitk-RISC. 

• Control signals required by control unit. 

1. Co: A valid instruction. 

2. Ci: Load P flag with the content of T flag. 

• Control signals required by execution unit. 

1. A_i,Ao: Execution unit input source for first operand. 00:RA, 01:Signature 
register, 10:flag register, 11:LSTPC. 
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2. Ai,A 2 : Execution unit input source for second operand. 00:RB, Olnmmediate 
register, 10;Z-register. 

3. A 3 ,A 4 : Execution unit subunit activation code. 01:add/sub unit, lOdogic unit, 
ll'.barrel shifter. 

4. As, Ae: Execution unit operation code. 

00:add,and,shr; 01:sub,or,shl; 10:adc,xor,sar; ll:sbb,not. 

5. Ay: Flag control line. 0:condition flags will not be changed, l:Condition flags 
will be changed. 

6. Ag; Send PC content to memory access unit in. low part of cycle. 

• Control signals required by Memory Access Unit. 

1. Mq: Enable/Disable unit. 

2. Mi: Operation type. 0:load. l:store. 

3. M 2 : Signed/Unsigned data. 0:unsigned. l:signed. 

4. M 3 ,M 4 : Data width. 01:byte, 10:half word, ll:full word. 

5. Ms: Data currently with execution unit will be written in register file. Used for 
internal forwarding mechanism. 

6. Mg: Send data received from execution unit to write back unit. 

7. My: Send data accessed from memory to write back unit. 

8. Ms: Send PC content received from execution unit to write back unit. 

• Control signals required by write back unit 

1. Wq: Enable/Disable unit. 

2. Wj: Data currently with memory access unit will be written in register file. Used 
for internal forwarding mechanism. 

3. M 2 : Destination of data. l:register-file, 0:flag register. 

If internal forwarding is done, then some of the signals lose their meaning. These 
signals are (A-i,Ao) and (Ai,A 2 ). If internal forwarding is to be done for first operand 
then A_i and Ao loose their meaning. If internal forwarding is to be done for second 
operand then Ai and A 2 loose their meaning. In case internal forwarding is to be done 
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for both operands then (A_i,Ao) and (A].A 2 ) all loose their meaning. Signals Mj and Wi 
are used to control internaJ forwarding. If one or both register operands of an instruction 
are also the destination register of the previous instructions and are not written back yet 
as the instruction has not passed through write back unit, then modified data is internally 
forwarded and not fetched from register-file. The internal forwarding is done in the following 
steps: 

a. First the location for internal forwarding is identified as follows: 

• The instruction whose destination matches with instruction currently in decoding 
process is identified, say 1. If there are two such instructions inside the pipeline 
then the instruction nearer to the entry end is taken as instruction I because it 
is the instruction whose result will be the final content of the register. 

• The location of instruction I is identified, say L. 

• The location of instruction 1 in pipeline when internal forwarding will be done to 
provide the correct data to the current instruction is identified. It will be I, -t- 1 
when internal forwarding is to be done from write back unit to execution unit 
and L + 2 when it is to be done from write back unit to memory access unit. 

b. After identification of the place of the instruction I control signals Ms or Wi are 
checked of the instruction I. The control signal Ms is checked if instruction I currently 
is in execution phase of execution cycle and Wj is checked when instruction I is in 
memory access phase of execution cycle. After this internal forwarding control signals 
are generated using above information and the information about the equality of 
source register of current instruction and destination register of instruction /. These 
signals will be used to forward result on BUS A. BUS B, or on Data bus(store type 
instruction). 

All control signals are generated during the decode phase of execution cycle. A PLA 
is used to generate all the control signals. The generated control signals are stored in a 
control register and the signals are issued to each stage of the the pipeline in the order of 
execution cycle. 
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Control PLA and Control Register 

We have used a CMOS NOR-NOR dynamic PLA for designing iitk-RISC control logic. The 
CMOS NOR-NOR PLA is the fastest among all CMOS PLAs because of parallel connected 
NMOS-transistor in AND and OR planes. 

There are 14 inputs to the PLA and 25 outputs are generated. Inputs are opcode(7), 
see bit(l), field identification bits(2), P flag(l). Conditional flags(4). The opcode, SCC bit 
and field identification bits are taken from the instruction to be decoded and currently in 
IR. 

Output control signals, destination regi.ster address, and information about internal 
forwarding are stored in control register. Control register is a three stage register. The 
information flows from stage 1 to stage 2 to stage 3. Each stage is made of master-slave 
CMOS D-latch with clearing facility. The first stage has 35 cells. These cells are used 
for storing destination register address! 7). internal forwarding control signals for internal 
forwarding from MEM stage to EXEC stage(2), internal forwarding control signals for 
internal forwarding from WB stage to EXEC stage(2), internal forwarding control signal for 
internal forwarding from WB stage to MEM stage! 1). control signals for write back unit(3), 
control signals for memory access unit(9), and control signals for execution unit(ll). The 
second stage has 20 cells. These cells are used for storing destination register address(7), 
internal forwarding control signal for internal forwarding from WB stage to MEM stage(l), 
control signals for write back unit!3), and control signals for memory access unit(9). The 
third and final stage has 10 cells. The cells are u.sed to store destination register address(7), 
and control signals for write back unit! 3). The first stage issues the control signals to 
execution unit, second stages issues signals to memory access unit while the third stage 
issues the control signals to write back unit. The internal forwarding control signals are 
sent to the unit whose result is to be internally forwarded. The control signals remain active 
throughout the cycle but are used by their controlled unit either in low or high part of cycle. 
The control register of iitk-RISC is shown in figure 4.17. 

4.3 Evaluation of iitk-RISC Architecture 

Here we have listed down some of the important characteristics of the iitk-RISC to conclude 
our discussion on the architecture of iitk-RISC. 
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Figure 4.17; The control register of iitk-RISC. 


• A simple architecture. 

• A 4-stage regular pipeline. 

• Simple control mechanism. 

• Balanced read and write cycle. 

• Fast fetch mechanism. 

• Fetching of operands in parallel with decoding. 





Chapter 5 

The iitk-RISC Design and Layout 


This chapter deals with the micro architecture of iitk-RISC. After a detailed description 
of the data path, layout design issues for different units are discussed. Actual layout for 
different units are also given. 

5.1 The iitk-RISC Data Path Design 

For compact layout the data path should be designed carefully. The general form of data 
path is a direct consequence of the pipeline scheme employed and the instruction set. The 
basic form of a data path is that a gate is driving some other gates through a resistive 
and capacitive path. Several designs of data path are possible. These include a precharged 
path, self amplified path, a static data path. The precharged path requires a multiphase 
clocking scheme. The data path are precharged to high logic but here discharging of data 
path is a critical issue. That means the NMOS transistor of the gate driving the data path 
should be designed carefully. Otherwise in fast operation on data path will not be possible 
because the driver will not be able to transfer 0 logic. Self amplified or self adjusted data 
path are suitable for low speed high capacitive and resistive data path. In this scheme pull 
ups and pull downs are used. The static scheme is very critical from the designer point of 
view. In this scheme the driver, generally a inverter, is designed in such a manner so that 
it is able to charge and discharge the data path in specified time. Generally data path is 
divided into segment and a driver is used for driving every segment to reduce the loading on 
driver. There is always a trade off between driver size- and the segment length a driver can 
drive. To drive a long segment with a reasonable switching time a fat gate will be required. 
It is generally used for high speed operations. In iitk-RISC we ha-ve used static data path 
scheme. Though it costs us mucli in terms of area but works satisfactorily. The segment of 
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data path for time critical signal is kept short. A typical iitk-RISC Data path segment is 
shown in figure 5.1, this data segment drives the input side of execution unit. 

5.2 The iitk-RISC Data Path 

Figure 5.2 presents the iitk-RISC data path which consists of the following: 

• Register-file: 128 registers of 32 bit each, with its dual-port address decoder and with 
latches RA, RB, RD. Rq is hardwired to contain 0. 

• Flag register: A 32 bit register. It consist of conditional flags(S,Z,V,C) and flags for 
system management(I,P,T). 

• Immediate Register: A 32 bit register and holds the immediate fields of instruction 
after sign extension to 32 bit data/offset. 

• Z-register: A 32 bit register hardwired to contain 0 aird used as second operand of 
some instructions internally. 

• TEMPi, TEMP 2 : 32 bit temporary registers used as the input registers for execution 
unit. 

• DST register: A 32 bit temporary register and used to store the result of execution 
unit. 

• Execution Unit: The execution unit take input form TEMPi and TEMP 2 and output 
is sent to DST register. The execution unit comprises of ALU and a shifter. The 
shifter uses 5 LSBs of TEMP 2 as shift-count. 
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• IR: It holds the instruction fetched during fetch cycle. Opcode held of instruction is 
decoded and other fields are used to fill immediate register and/or used to fetch the 
registers. Destination register address is send to control section. 

• Signature: A 32 bit register is used to store the memory address of the instruction 
currently in IR. 

• PC: The program counter, which holds the address of the instruction to be fetched in 
current cycle. 

• LSTPC: The Lasl-PC register, which holds the content of Signature when int signal 
is active. 

• Memory Access Unit: This unit is used to access the memory in case of load and 
store type instructions. It reads effective address from DST data to store from RB 
and accessed data is sent to RD. 

• BUS A, BUS B: The register-file read buses. Data traverse from RA, RB to other 
units. 

• BUS D: The register-file write bus. Data traverse from other units to RD. 

• SYSTEM ADDRESS BUS: The address of memory during data access or instruction 
fetch is send across this bus. 

• SYSTEM DATA BUS: The data or instruction are fetched over this bus. 

• Interrupt Offset Lines: Interrupt ofiTset is read on the request of external interrupt. 

• busOUT: The ofF-chip data and address bus. 

• CONTbusOUT: The off-chip control signal bus. The off-chip control signals include 
WR/over/tneRD, INT, INT, Wo,\Vi, and interrupt offset lines. 

5.2.1 Paths Followed for Instruction Execution 

In pipelined architecture instruction passes through all the stage of the pipeline during 
the instruction execution. The result of the stages of the pipeline is stored or buffered in 
intermediate latches. Data is forwarded from the output latch of one stage to input of the 
next stage. As discussed earlier the iitk-RISC pipeline is a 4 stage pipeline. During the 
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course of execution of instruction data followed some specified path to pass through the 
pipeline. Data path of iitk-RISC is presented in figure 6 . 2 . There ai'e a few activities that 
may be going on in the data path during each cj'cle. 

• The two source operands of instruction are routed to the execution unit. 

• The output of execution unit or of the content of signature register is routed to memory 
access unit or to the PC. 

• The output of memory access unit is routed to write back unit or(and) external 
memory interface. The output of PC is routed to write back unit in case of call 
instruction. 

• The output of write back unit routed to register-file or flag register. 


The two sources, Sj and S2, are routed on different buses or data path. Source Si follows 
the BUS A and it can be any of the data sources attached to BUS A. These sources are RA, 
Signature register, flag register, LSTPC or internally forwarded result of the instruction 
already in pipeline. Source S2 follows the BUS B and it can be any of the data sources 
attached to BUS B. These sources are RB. immediate register, Z-register, or internally 
forwarded result of the instruction already in pipeline. The particular choice of Si and 
S2 depends on the instruction. For example, getlpc instruction requires LSTPC as Si and 
Z-register as S2- The two sources are latched in the input registers TEMPi and TEMP2 of 
execution unit respectively. Which are used as the input for all binary operations whereas 
TEMPi is the input source of unary operation inu. The result of execution unit is forwarded 
to the output register of execution unit. 

The output of execution unit, DST register, is routed to memory access unit or to the 
program counter, if the instruction is control transfer instruction then the DST register is 
routed to PC otherwise the content of DST register is forwarded to memory access unit. 
The other input source to the memory access unit is register RB content delayed by 2 clock 


cycles if instruction is store type. If instruction is load type then the other source is memory 
access interface or system DATA BUS. 

The output of memory access unit is routed to write back unit or(and) memory access 
interface. If instruction is store type then output of memory access unit is forwarded only to 
memory access interface. If instruction is load type system 
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ADDRESS BUS to memory access interface and accessed data is sent to write back unit. 
In other cases the output of memory access unit is forwarded to write back unit. If internal 
forwarding is required then output is also forwarded to the input latch(es) of the execution 
unit. The output of memory access unit is wTitten into register file or flag register. Some 
instruction do not have any destination field in these cases write back unit do not forward its 
output to any of the unit. The output of memory access unit may be internally forwarded 
to the input latch(es) of execution unit or(and) memory access unit. Internal forwarding is 
discussed in chapter 4. 

5.3 The Design Issues and Layouts 

In this section the design issues of iitk-RISC layout are discussed. The discussion is de- 
pended on the discussion of previous chapter. We have included the layouts of different units 
and building blocks of these units. The simulation results of some time critical units are also 
given. The simulation in most of the cases is done with SLS( Switch Level Simulator) [4]. 
However, some critical parts are simulated with SPICE [10]. 

The layout of iitk-RISC is illustrated in figure -5.3. The floor plan is overlaid for easy 
understanding. In the design of different units some of the registers are duplicated to save 
the space used by metal wires. 

5.3.1 The Register-file 

In iitk-RISC register-file occupies 39.32% of the whole designed area. The register file is 
divided into two banks of 64 registers each to make the layout suitable for connecting with 
other units and to optimize the access time. The critical delay path of register file is shown 
in figure 5.4. 

If read signal is enable then the decoded signal is applied to selected register. The 
decoded signal will charge 32 output pass transistor(decoder line) of jth register to enable 
the latched data in jth register. This data follows the output lines(bit lines) of the register- 
file. The following discussion is given for only one of the decoder lines required to enable 
NMOS transistor of a transmission gate. 

Let tjt and Ut be the delay associated with decoder line and bit line respectively, that is 
at least t^t will be required to enable the decoder line and for bit line this time is tj,*. The 
access time of a particular register will be ta=t<fj-t-t(,t. In case of the design of re^ster-file 
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t(ii and tit are the two design parameter. The delay idt depends on size of inverter IV^j and 
the decoder. However, the delay due to decoder can not be changed very much. The delay 
tit depends on the size of inverter IVij and pass transistor Py. However, effect of Ptj is 
not much because there are two transistor(a transmission gate) in parallel to charge the bit 
line. For low access time tdj and tij should be reduced. In our design the values of t,^- and 
tij are 7ns and 13ns respectively. We have got these values by careful design of the two 
critical inverters and by reducing the length of poly lines. The decoder lines are taken as 
first metal whereas bit lines are taken as second metal. The bit lines are amplified at the 
output of register-file to avoid further degradation in bit quality. 

The layout for register-file together with one register cell is shown in figure 5.5. 

5.3.2 Execution Unit 

The execution unit of iitk-RJSC is most lime critical unit because the longest delay path is 
inside in this unit and is shown in figure 5.6. 

The length of this path is approximately 45 gates. It starts from the output of TEMP 
register and passes through XOR gate, lower carry generation block, upper carry generation 
block, add block and zero detector(zero detector finds whether result of execution unit is 
0 or not). That is Z flag is the last signal that will be available from execution unit. The 
worst case delay for this path is appro.ximately 51ns. 

The layout for execution unit is illustrated in figure 5.7. 

5.3.3 Memory Access Unit 

The memory access unit of iitk-RISC is designed keeping in mind that no instruction cache 
is provided. Due to this a full cycle is used for memory access. For laying compact layout 
the memory access unit is not located at one place and some of its registers are duplicated to 
avoid long wires. The internal forwarding buses for internal forwarding from any other unit 
to execution unit are kept with execution unit. This way we have avoided the long run of 
wires and duplication of data paths. For internal forwarding from write back unit to memory 
access unit a duplicate register is used inside the memory access unit. The dupUcation 
of register at most of the places is done for easy implementation of internal forwarding 
mechanism. The layout for main part of memory access unit is shown in figure 5.8. 




5.3 The Design Issues and Layouts 


54 


5.3.4 Write Back Unit 

This is the simplest unit in iitk-RISC and required least design effort. It consist of two input 
register corresponding to two sources and a output register with two ports corresponding 
to two destinations of final result. If result is to be written into any one of the destination 
then a write signal is sent to that unit and corresponding output port is also enabled. The 
layout of this unit is given in figure 5.9. 

5.3.5 The iitk-RISC Control Unit 

In this section we discuss the organization of iitk-RISC and the design issues in control logic 
design. The control logic is designed with a CMOS NOR-NOR dynamic PLA. The main 
issue in designing the control unit of iitk-RISC is to minimize the number of control signals. 

The organization of iitk-RISC’ control path is given in figure 5.10. Decoding of in- 
struction in iitk-RISC is easy because of the simple instruction format with fixed fields of 
fixed size and positions. The decoding is further simplified because of the small number 
of addressing modes supported in iitk-RISC. The assumption that all possible operands 
are available with an instruction allow iitk-RISC to fetch all the operands in parallel with 
decode of instruction and require no control for it. 

The instruction is fetched in IF phase. The opcode field together with condition flags, 
privilege flag and the fields of instruction specifying the addressing mode used are decoded 
by the control logic. At the end of decoding all the 25 control signals are pushed into the 
control register. Control signals for internal forwarding are also generated in parallel with 
decoding and pushed into control register. These internal forwarding signals may or may 
not enable a particular type of internal forwarding. The control register, a 3 stage register, 
issues control signals to all the units of iitk-RISC. 

The control logic is designed to enable fast decoding of instructions and is divided 
into three parts. Each part is implemented using a CMOS NOR-NOR dynamic PLA. We 
observed that condition flags are updated only at the end of cycle and the control signals 
affected by these flags are required in the end of cycle. Due to this, these control signals can 
not be generate during the decoded phase. Keeping in view of this fact the control signals 
used to load the PC are generated by separate decoding logic and it is one of the three 
parts mentioned earlier. The other two parts are used to generate control signals for control 
unit and execution unit, and memory access unit and write back unit respectively. This 
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way we have reduced the tolaJ mimber of niinterni required to implement the control logic. 
Further reduction in total number of minterm i.s achieved by multivalued optimization using 
espresso [3]. A total of 72 minterms are implemented in all the three PLAs to realize the 
control logic of iitk-RISC. The layout of control unit is given in figure 5.11. 

5.4 Conclusion 

The iitk-RISC is implemented in c3tu process with minimum feature size 1.6/im. Total 
size of the processor implemented is 7.85 x 7.25mm (56.95mm^) without pads. This can be 
easily integrated in a wafer. Total number of pins requiied are 82. 
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Figure 5.2: Data Path of iitk-RISC. 
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Figure' 5.7: 'I'lio Hxocut.ioii Unit, of iitk-HJSC 
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Figure 5»9i The Write Hack Unit of iilk-ltlSt'. 
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Figure 5.10: Control Path of iitk-RISC. 









5.4 Conclusion 


65 



Chapter 6 

Conclusions 


RISC hcis become a mainstream movement to improve computing power and keep cost 
of design and design time down. It utilizes the chip area and high bandwidth of on-chip 
communication effectively. In this thesis a 32 bit VLSI RISC architecture is designed and 
referred as iitk-RISC. 

In chapter 2 some of the existing RISC architectures are surveyed. Following are the 
interesting features of these RISC processors. 

• All are 32 bit architecture. 

• All are pipelined architecture and use a 2 to 4 stages pipeline. 

• The number of on-chip registers are decreasing. Some new architecture has few on 
chip registers. 

. • The size of instruction set and number of addressing modes supported are increased. 

• The complexity of the processors is also increased considerably. 

• Delay jump concept is u.sed by almost all the RISC architectures. 

• On-chip or off-chip instruction and data caches are used by almost all processors. 

The above findings led to the formulation of the architectural definition of iitk-RISC. 
Some of the good features of the existing RISC processors are taken while designing iitk- 
RISC. 

In Chapter 3 the instruction format, addressing mode and instruction set of iitk-RISC are 
discussed. The iitk-RISC supports an instruction set with fixed fields of fixed size and posi- 
tion. The instruction formats are Register- Register, Long-Immediate and Short-Immediate. 
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A particular type of instruction format is meaningful only with its counterpart addressing 
mode. The iitk-RISC instruction set supports Register-Register, Short-Immediate, Long- 
Immediate, Register-Relative, and PC-relative addressing mode. With a few exceptions all 
Data manipulation instructions support Register- Register and Short-Immediate addressing, 
whereas control transfer instructions support Register- Relative and PC-Relative address- 
ing. Instruction set has 57 instructions and it includes instructions for register to register 
data transfer, data manipulation, memory access and some miscellaneous operations. The 
memory map of iitk-RISC is 4 Giga byte and organized in bytes banks. Instructions are 
provided to read a signed or unsigned byte, a signed or unsigned half word, and a full word 
from memory. Similarly instructions are provided to write a byte, a half word, and a full 
word into memory. The iitk-RISC CPU has two modes of operation namely privilege and 
non- privilege. The current mode of CPU is determined by a system flag P. 

The pipeline of iitk-RISC, described in chapter 4. is a four stage pipeline. The stages 
of pipeline include Instruction fetch and decode, execution. Memory access, and WB(write 
back). Unlike other RISC processors memory access stage is a. regular feature of iitk-RISC 
pipeline. In control transfer instructions control transfer is delayed by one instruction to 
avoid bubble inside the pipeline. 

Instruction is fetched during the IF part of cycle and decoded by control section of control 
unit. Control section includes control register and control logic, control logic generates all 
the control signals that may be required during the execution of an instruction. These 
control signals together with control signals for internal forwarding are pushed in control 
register. Control register is a three stage register and uses the control signals to the diflferent 
stages of pipeline during the execution. 

The register-file has 128 registers of 32 bit each. Register Ro is a special type of register 
and always contains 0. Read and write operations on it are allowed. The execution unit 
is capable of performing 32 bit operation in single cycle. Conditions flags are modified 
according to the result of operation performed by execution unit if SCC bit of instruction is 
set and allowed for the instruction. All memory access are done by memory access unit. If 
instruction is not a memory access instruction then it introduces a latency and at the end 
of cycle result is forwarded to write back unit. The write back unit writes the forwai'ded 
data into re^ster-file, flag register, or no where depending on the destination of the result, 
which is determined by the instruction in execution. 
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Item 

Chip area(mm^) 


Transistors 

%Transistors 

Register file 

22.39 

39.32 

62080 

74.22 

Execution unit 

7.76 

13.62 

5740 

6.86 

Memory access unit 

4.46 

7.84 

2962 

7.08 

Write back unit 

2.52 

4.43 

2096 

2.50 

Control unit 

3.52 

5.03 

6712 

8.03 

Others 

0.34 

0.58 

1086 

1.30 


Table 6.1: Design Metrics for iitk-RISC. 

Interrupt unit processes the interrupt requests from off- chip devices, memory access unit 
on illegal memory address, or control unit on illegal instruction. Off-chip request can be 
disabled by clearing the condition flag I. Interrupt from memory access unit has the highest 
priority. 

In chapter 5 implementation issues are discussed. The implementation issues includes 
design of data path, different functional units, and control path. The control unit of iitk- 
RISC occupies only 6.18% of area. For fast operation of iitk-RISC control logic is designed 
considering the fact that all the control signals are not required at the same moment. This 
enables us to break the control logic in parts. Some other optimization technique are also 
used to further reduce the size of control logic. 

In table 6.1 the design metrics of iitk-RISC is given. The layout for iitk-RISC is de- 
signed using cStu process, double metal CMOS technology, keeping A at 0.20/u(1.6At 
technology). The total number of pins required are 82. 

The iitk-RISC architecture can be made more effective by making some extensions to 
its present architecture. Some of the suggestions are given during the analysis. We have 
listed down all of them and some other possible extensions. 

• On-chip instruction cache may be used instead of off-chip. 

• On-chip or off-chip data cache may be used. 

• memory management unit(MMU) may be used. 

• In the last, Support for specialized I/O devices like DMA, graphics controller may be 
provided. 














Appendix A 

Instruction Set for iitk-RISC 


The iitk-RISC instructions are full word long and reciuire one cycle to execute when pipeline 
is full. The first two columns in table fi.l give the mnemonic and a brief description of 
each instruction. The third column gives the opcode of an instruction. The fourth column 
indicates the addressing modes that may be .supported by the instruction using the following 
symbols: 

RR Register- Addressing 
LI Long-Immediate Addressing 
SI Short-Immediate Addressing 
RL Register- Relative Addressing 
PL PC-Relative Addressing 

The fifth column indicates the possible flag settings using the following symbols: 


-Not affected 

0 Cleared to 0 

1 set to 1 

X set or cleared according to the result of execution unit if SCC=1 
•w set or cleared according to data to be written by WB-unit 
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Mnemonic 

Description 

opcode 

Addressing 

Modes 

flags 

C S Z 

V 

adc 

Add with carry 

02H 

RR,SI 

X X 

X 

X 

add 

Addition 

OlH 

R.R,S1 

X X 

X 

X 

and 

Logical AND 

OSH 

RR,SI 

0 X 

X 

0 

caJlp 

Call Soubroutine 

16H 

PL 

- 

- 

- 

callr 

Call Soubroutine 

15H 

RL 

- 

- 

- 

getlpc 

Get last PC' 

19H 

RR 

- 

- 

- 

getpsw 

Get PSW 

18H 

RR 

- 

- 

- 

inv 

Logical NOT 

OSH 

RR,S1 

- 

- 

- 

jcp 

Jump on Carry 

38H 

PL 

- 

- 

- 


Jump on Carry 

28H 

RL 

- 

- 

- 

jep 

Jump if Equal 

3CH 

PL 

- 

- 

- 


Jump if not Equal 

2CH 

RL 

- 

- 

- 

jgep 

Jump if C-ireater 
or Equal 


PL 

" 

“ 

- 

jger 

Jump if Greater 
or Equal . 

23H 

RL 

• 

*■ 


jgtp 

Jump if Greater 

3]H 

PL 

- 

- 

- 

jgtr 

Jump if Greater 

21 H 

RL 

- 

- 

- 

jhip 

Jump if Higher 

35H 

PL 

- 

- 

- 

jhir 

Jump if Higher 

25H 

RL 

- 

- 

- 

jlep 

Jump if Less or 

Equal 

32H 

PL 




jler 

Jump if Less or 

Equal 

2211 

RL 

* ** 

■ 


jlosp 

Jump if Lower or 
Same 

3()H 

PL 

- 

■ 

■ 

jlosr 

Jump if Lower or 
Same 

26H 

RL 


* 


jltp 

Jump if Less than 

34H 

PL 

- 

- 

- 

jltr 

Jump if less than 

24 H 

RL 

- 

- 

- 

jmpp 

Jump always 

3FH 

PL 

- 


- 

jmpr 

Jump Always 

2FH 

RL 

- 

- 

- 

jncp 

Jump if No Carry 

37H 

PL 

- 

- 

- 

jncr 

Jump if No carry 

27H 

RL 

- 

- 

- 

jnep 

Jump if Not Equal 

3BH 

PL 

- 

- 

- 

jner 

Jump if Not Equal 

2BH 

RL 

- 

- 

* 

jnp 

Jump if positive 

3AH 

PL 

- 

- 

- 

jnr 

Jump if negative 

2AH 

RL 

- 

* 

- 

jnvp 

Jump if No Overflow 

3DH 

PL 

- 

- 

- 

jnvr 

Jump if No overflow 

2DH 

RL 

- 


- 

jPP 

Jump if Positive 

39H 

PL 

- 

* 

- 

jpr 

Jump if Positive 

29H 

RL 

- 


- 

jvp 

Jump if Overflow 

3EH 

PL 

- 

- 

- 

jvr 

Jump if Overflow 

2EH 

RL 

* 
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Mnemonic 

Description 

opcode 

Addressing 

Modes 

C 

Rags 

S z 

V 

loadbs 

Load Signed byte 

OFH 

RR,SI 

- 

- 

- 

- 

loadbu 

Load Unsigned byte 

lOH 

RR,SI 

• 

- 

- 

- 

loadhs 

Load Signed Half 

Word 

ODH 

RR,S1 





loadbu 

Load Unsigned 

Half Word 

OEH 

RR.Sl 

' * 




loadw 

Load full Word 

OCH 

RR.Sl 

- 

- 

- 

- 

mov 

Move register 

14H 


- 

- 

- 

• 

nop 

No operation 

OOH 


- 

- 


" 

or 

Logical OR 

06H 

RR,SI 

0 

X 

X 

0 

putpsw 

Load Flag register 

lAH 


w 

w 

w 

w 

reti 

Return From Interrupt 

IBH 


- 


- 

- 

sar 

Shift Arithmetic Right 

OBH 

RR.Sl 

X 

X 

X 

0 

sbb 

Subtract with Borrow 

04H 

RR,SI 

X 

X 

X 

X 

shl 

Shift Logical Left 

09H 

RR.Sl 

X 

X 

X 

0 

shr 

Shift Logical Right 

OAH 

RR,S1 

X 

X 

X 

0 

storb 

Store a byte 

13H 

RR 


- 

- 

- 

storh 

Store a Half Word 

12H 

RR 

- 

- 

- 

- 

storw 

Store a Word 

IIH 

RR 

- 

- 

- 

- 

sub 

Subtractioir 

03H 

RR,SI 

X 

X 

X 

X 

0 

xor 

Logical XOR 

07H 

RR.Sl 

0 

X 

X 
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Appendix B 

Basic Modules of iitk-RISC 
Layout 



ell; execution unit_cell 


0.0 79.89 
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Cell: PLAcells 
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Cell: gates 


0.0 


28.29 
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Cell : register_f ile_cells 


0.0 


39.89 
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